ADAPTIVE POINTWISE ESTIMATION FOR PURE JUMP LEVY 

PROCESSES 
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Abstract. This paper is concerned with adaptive kernel estimation of the Levy density 
N(x) for bounded-variation pure-jump Levy processes. The sample path is observed at n 
discrete instants in the "high frequency" context (A = A(n) tends to zero while nA tends 
to infinity). We construct a collection of kernel estimators of the function g(x) = xN(x) 
and propose a method of local adaptive selection of the bandwidth. We provide an oracle 
inequality and a rate of convergence for the quadratic pointwise risk. This rate is proved 
to be the optimal minimax rate. We give examples and simulation results for processes 
fitting in our framework. We also consider the case of irregular sampling. 
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1. Introduction 

Consider (Lt,t > 0) a real- valued Levy process with characteristic function given by: 

(1) ifft(u) = E(expiuL t ) = exp (t [ (e iux - l)N(x)dx). 

We assume that the Levy measure admits a density N and that the function g(x) = xN(x) 
is integrable. Under these assumptions, (L t ,t > 0) is a pure jump Levy process without 
drift and with finite variation on compact sets. Moreover E(|Lf|) < oo (see Bertoinl 
( 19961 )). Suppose that we have discrete observations (Lj-a, k = 1, n) with sampling 
interval A. Our aim in this paper is the nonparametric adaptive kernel estimation of 
the function g{x) = xN{x) based on these observations under the asymptotic frame- 
work n tends to oo. This subjec t has been recently investigated by several authors. 
Figueroa-Lopez and Houdrel (|2006h use a penalized projection method to estimate the 
Levy density on a compact set separated from 0. Other authors develop an estimation 
procedure based on empirical estimations of the characteristic function V'a(w) of the in- 
crements (Zf^ = LfcA — £(fc-i)A>& = 1, •••,») and its derivatives followed by a Fourier 
inversion to recover the Levy den sity. For low frequency data (A is fix ed) , we can quote 
Watteel and Kulpergerl (|2003I l or iJongbloed and van der Meulenl faood ) for a parametric 
study. Still in the low frequency framework, iNeumann and Reifil (120031 estimate v(x) = 
x 2 N(x) in the more general case with drift and volatility, and Comte and Genon-Catalotl 



(|2010bl l use model selection to build a n adaptive estimator. An ada p tive m ethod to esti- 
mate linear functionals is also given in iKappud (|2012l l. iBelomestnv! (|201ll l addresses the 
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issue of inference for time-changed Levy processes with results in term of uniform and 
pointwise distance. 

In the high frequency context, which is our concern in this paper, the problem is simpler 
since, for any fixed u, iPa(u) — > 1 when A — > 0. This implies that V'a(w) need not to be 
estimat ed and can simply be replaced by 1 in the estimation procedures. This is what is 
done in Comte and Genon-Catalot ( 20091 ). These authors start from the equality: 



(2) 



E 



obtained by differentiating ([IJ). Here g*(u) - 
g, well defined since we assume g integrable. 



. j e lux g( x )dx is the Fourier transform of 
Then, as ipA^u) ~ 1, equation ([2]) writes 



E 



~ Ag*(u). This gives an estimator of g*(u) as follows: 



1 n 



AiuZ^ 
6 K . 



k=l 



Now, to recover g, the authors apply Fourier inversion with cutoff parameter m. Here, we 
rather introduce a kernel to make inversion possible: 



nA 



Y,ZtK*(uh)e 



iuZ k 



k=l 



which is in fact the Fourier transform of l/(n/iA) ^fc=i Z{^K((x — Z^)/h). At the end, 
in the high frequency context, a direct method without Fourier inversion can be applied. 
Indeed, a consequence of (|2|) is that the empirical distribution: 



fi n {dz) 



1 n 

^a^ z > 



Ax 

k °zi 



(dz) 



k=l 



weakly converges to g(z)dz (note that th e idea of exploiting this weak convergence is 
already present in Figueroa-Loped ( 2009bl )). This suggests to consider kernel estimators 
of g of the form 



(3) 



9h(x) = K h -kfi n (x) 



— Y 

k=\ 



where K^x) = (l/h)K(x/h) and K is a kernel such that J K = 1. Below, we study the 
quadratic pointwise risk of the estimators gh(x) and evaluate the rate of convergence of this 
risk as n tends to infinity, A = A(n) tends to and h = h(n) tends to 0. This is done under 
Holder regularity assumptions fo r the function q. No te that a pointwise study involving a 
kernel estimator can be found in Ivan Es et al. ( 20071 ) for more specific compound Poisson 
processes, but the est imator is different from ours, as well as the observation scheme. In 
Figu eroa- LopezJ (|201lh a pointwise central limit theorem is given for the estimation of the 
Lev y densi t y, as well as confidence intervals. Still in the high frequency context, we can 
cite buvall (|2012l ) for the estimation of a compound Poisson process with low conditions 
on A, but for integrated distance. 

In this paper, we study local adaptive bandwidth selection (which the previous au- 
thors do not consider). For a given non-zero real xq, we select a bandwidth h(xo) such 
that the resulting adaptive estimator g^^ixo) automatically reaches the optimal rate of 
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convergence corresponding to the unknown regularity of the function q. The method o f 
bandwidth selection follows the scheme developped by iGoldenshluger and Lepskil (1201 if ) 



for density estimation. The advantage of our kernel method is that it allows us to estimate 
the Levy density at a fixed point, with a local adaptive choice. This method is easy to 
implement, and we show its good numerical performance on different exa mples. Moreover 



our con tribution includes an alternative proof for a lower bound result (see iFigueroa-Lopez 



which proves the optimality of the rate for this pointwise estimation. We also 



study the framework of irregular sampling. 

In Section [2j we give notations and assumptions. In Section [31 we study the pointwise 
mean square error (MSE) of (jh(xo) given in ([3]) for g belonging to a Holder class of 
regularity (3 and we present the bandwidth selection method together with both lower 
and upper risk bound for our adaptive estimator. The rate of convergence of the risk 
is (log(nA)/nA) 2 ' 3//2 ' 3+1 which is expected in adaptive pointwise context. Examples and 
simulations in our framework are discussed in Section [U The case of irregular sampling is 
addressed in Section [5] and proofs are gathered in Section [6l 

2. Notations and assumptions 

We present the assumptions on the kernel K and on the function g required to study 
the estimator given by (J3j) - First, we set some notations. For any functions u, v, we denote 
by u* the Fourier transform of u, u*(y) = J e tyx u(x)dx and by ||u||, < u, v >, u * v the 
quantities 

ll M l| 2 = / \u(x)\ 2 dx, 



< u, v >= J u(x)v(x)dx with zz = \z\ 2 and u-kv(x) = J u{y)v{x — y)dy. 

For a positive real /?, [(3\ denotes the largest integer strictly smaller than f3. Let us also 
define the following functional space: 

Definition 2.1. (Holder class) Let f3 > 0, L > and let I = |_/3J • The Holder class 
H(f3,L) on M is the set of all functions f : M — > R such that derivative exists and 
verifies: 

\fV)( x )-f«)(y)\<L\x-y\P- 1 , Vx,yGR. 

We can now define the assumptions concerning the target function g: 
Gl: g G L 2 

G2: g* is differentiable almost everywhere and its derivative belongs to L 1 

G3(p): For p integer, f \x\ p ~ 1 \g(x)\dx < oo 

G4(/3): g€H(P,L) 

G5: g' exists and is uniformly bounded 

The first assumption is natural to use Fourier analysis, as well as G3(l). Assumption 
G3(p) ensures that E|Z^| P < oo. G4 is a classi cal regulari t y assu mption in nonparametric 
estimation; it allows to quantify the bias (see Tsybakov ( 20091 )). Note that G5 implies 
that g £ H(1,L') so we can assume /3 > 1. 

Now let us describe which kind of kernel we choose for our estimator. For m > 1 an 
integer, we say that K : R — > M is a kernel of order m if functions u t— >■ u J K(u),j = 
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0, 1, ...,m are integrable and satisfy 

(4) J K(u)du = l, Ju j K(u)du = 0, j G {1, m}. 

Let us define the following conditions 

Kl: K belongs to L 1 n L 2 n L°° and K* G L 1 

K2(/3): The kernel K is of order I = |_/3J and f \x\P\K(x)\dx < +oo 

These assumptions are standard when working on problems of estimation by kernel 
methods. Note that there is a way to build a kernel of order I. Indeed, let u be a bounded 
integrable function such that u G L 2 , u* G L 1 and J u{y)dy = 1, and set for any given 
integer I, 

< 5 > ^-gQ^MD- 



The k ernel K defined by (El) is a kernel o f orde r I which also satisfies Kl (see lKerkyacharian et al 



( 200ll ) and iGoldenshluger and Lepskil ( 201ll )). As usual, we define by 



VieR K h {x) = g 
In all the following we fix xq G R, xq ^ 0. 

3. Risk bound 

3.1. Risk bound for a fixed bandwidth. In this subsection, the bandwidth h is fixed, 
thus we omit the subscript h for the sake of simplicity: we denote g = (jh- The usual bias 
variance decomposition of the Mean Squared Error yields: 

MSE(x , h) := E[(g(x ) - g(x )) 2 ] = E[(g(x ) - E[g(x )]) 2 } + (E[g(x )] - g(x )) 2 . 

But the bias needs further decomposition: 

b(x ) 2 := (E[g(x )} - g(x )) 2 < 2h(x ) 2 + 2b 2 (x ) 2 

with the usual bias, 

6i(a? ) = K h -kg(x ) - g{x ), 
and the bias resulting from the approximation of i^a(u) by 1, 

6 2 (zo) = E[g(xo)} - K h -kg(x ). 
We can provide the following bias bound: 

Lemma 3.1. Under G3(l), G4((3), G5 and if the kernel K satisfies Kl and K2(ot) with 

\b(x )\ 2 < ah^ + c'iA 2 
with ci = 2(L/[/3J! / \K{v)\\vf dv) 2 andc'i = 2(2|| 5 , || 00 || 5 || 1 ||K|| 1 ) 2 . 
Moreover, the variance is controlled as follows: 
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Lemma 3.2. Under Gl and G2, and if the kernel satisfies Kl, we have 

Var[?(x )] < lJ* ll h \\(9*y\\l + Hsl^A) < c 2 -^ + c' 2 \ 
nnA 2ir nhA nn 

with c 2 = \\(g*)'\\ 1 \\K\\l/(27r) and c' 2 = \\K\\l\\g\\l. 

Lemmas 13,11 and 13.21 lead us to the following risk bound: 

Proposition 3.1. Under Gl, G2, G3(l), G4((3), G5 and if K satifies Kl and K2(a) with 
a > (3, we have 

(6) MSE(x , h) < Cl h 2 ? + c 2 — !— + c' 2 \ + c'iA 2 . 

nnA nn 

Recall that A = A(n) is such that lim n _ 5>+00 A = 0, thus 1/nh is negligible compared 

i 

to 1/n/iA. For the two first terms the optimal choice of h is h op t oc ((nA) 2 < 3 + 1 ) and the 

/ 2/3 \ 2/3 

associated rate has order O ( (nA) 2 < 3 + 1 J . Next, a sufficient condition for A < (nA) 2 / 3 + 1 
for all j3 is 

(7) A = Oin- 1 ^). 

Proposition 3.2. Under the assumptions of Proposition \3.1\ and under condition ^Tty, 
the choice h op t oc ((nA) 2 ' 3+1 ) minimizes the risk bound (0) and gives MSE(xo,h op t) = 

2/3 9 2/3 

0((nA) 2/3+ 1 ). a consequence E[(g(x )/x - N(x )f] = 0((nA) 2/9+ 1 ). 



We can link this result to the one o f lFigueroa-Loped (|201lh who proves that his projec- 



tion estimator N is such that (N(xo) — N(xo))(nA) a tends to a normal distribution for 
any < a < 13/(2(3 + 1). 

The rate obtained in Proposition 13.21 turns out to be the optimal minimax rate of 
convergence over the class T~L(f3, L). This result is proved in iFigueroa- Loped (j2009ah in the 



more general case of estimators based on the whole path of the process up to time nA. 
In our case of discrete sampling, another proof is given in Section \6. 3\ where we prove the 
following result: 

Theorem 3.1. Assume A = 0(1) and A -1 = 0(n). Let xq ^ 0. There exists C > such 
that for any estimator g n (%o) based on observations Z^, . . . , Z!^ , and for n large enough, 

2/3 

sup E g [(g n (x ) - g(x )) 2 ] > C(nA) - ^ . 

Obviously, the result is also true replacing g by the Levy density N. 

3.2. Bandwidth selection. As f3 is unknown, we need a data-dri ven selection of the 
bandwidth. We follow ideas given in Goldenshluger and Lepskl ( 20 111 ) for density estima- 



tion. We introduce a set of bandwidth of the form H = {4j, 1 < j < M} with M an 
integer to be specified later. Actually it is sufficient to control 2~2 heH h~ w for some w so 
that more general set of bandwiths are possible. We set: 

V(h) = C7 ^ 
nnA 
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with Co to be specified later. Note that V(h) has the same order as the variance multiplied 
bylog(nA). We also define gh,h'(xo) = Kh>*gh(%o) = Kh*gh'(xo)- This auxiliary estimator 
can also be written 

1 n 

9h,h'(x ) = —rYl Z k K h' * K h (x - Z£). 



k=i 



Lastly we set, as an estimator of the bias, 



A(h,x ) = sup [\g h ,h'(xo) - 9h'(x )\ 2 - V(h')] 



h'eH 



+ 



The adaptive bandwidth h is chosen as follows: 



h = h(xo) € argmin{^4(/i, xo) + V(h)}. 
We can state the following oracle inequality. 

Theorem 3.2. We use a kernel satisfying Kl and a set of bandwidth H = {-^,1 < j < 
M} with M = 0{(n&) 1 /*). A ssume that g satisfies Gl, G2, G3(5) and take 



(8) 



C = C (c) = —\\K\\ 2 

Z7T 



Gn'iii + iiffil) 



with c > 16max(l, Hqq). Then, for A < 1, 

E[\g(x ) - 9 h (x )\ 2 } < C j mf {\\g - E[g h ]\\l + V(h)} + 

Thus our estimator g- h has a risk as good as any of the collection (gh)h&H, up to a 
logarithmic term. 

Note that the theorem is valid for c large enough, say c > cq. In the proof, we obtain 
the upper bound 16max(l, Hqq) for cq, unfortunately we can conjecture that this bound 
is not the optimal one. To obtain a sharper bound we have tuned cq in the simulation 
study. 

The definition of the estimator uses ||(<7*)'||i and ||<7*||2, but these quantities can be 
estimated with a preliminar estimator of g*. More precisely, we set Kq = l[-i,i] and 



iioryi 



k=l 



du with hi = (nA) -1 / 3 , 



/ 1 n 

k=l 



du with h 2 = (nA) _1/3 . 



We introduce the following regularity condition: a fonction ip belongs to the Sobolev 
space Sob(a) if J \ip* (u)\ 2 \u\ 2a du < oo. Then, reinforcing the conditions on g, we obtain 
a similar theorem with an empirical Cq. 

Theorem 3.3. We use a kernel satisfying K\ and K 2 (a) with a > 1, and M = ©((nA) 1 / 3 ). 
Assume that g satisfies Gl, G2, G3(32), G4(l), G5. Assume also that g and xg(x) belong 
to Sob(l). Take 

Co = ^\\K\\ 2 {\\ifn\ 1 +^g%) 
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with c > 32max(l, ||K||oo)- Then, for n~ l < A < Cn" 1 / 3 , 

E[\g(x ) - g h (x )\ 2 ] < C { inf {\\g - E\g h ]\\l> + E(y(h))} + 

Let us now conclude with the consequence of this theorem in term of rate of convergence. 
As already explained, as we need assumption G5 to control the bias, we can assume /3 > 1. 
Then h opt tx (log(nA)/nA) 1 /( 2/3+1 ) > (nA)" 1 / 3 belongs to H as soon as M is larger than 
a constant times (nA) 1//3 . Hence we can state the following corollary. 

Corollary 3.1. Assume that g satisfies Gl, G2, G3(5), G4(f3) with j3 > 1 and G5. We 
choose a kernel satisfying Kl and K2(a) with a > (3, and M = [(nA) 1 / 3 ]. Take Co as 
in Theorem \3.2\ (or as in Theorem \3.3\ with assumptions of this latter theorem). Then, if 
n- 1 < A < Cn- 1 ^, 

E[\g(x ) - h^)W = ((log(nA)/nA) ~ 

Then the price to pay to adaptivity is a logarithmic loss in the rate. Neverthe l ess th is 
phenomenon is known to be unavoidable in pointwise estimation (see iButuceal ( 200 ll ) ) . 
Thus g~ h (xo) (resp. g~ h {xo)/xo) is an adaptive estimator for g(xo) (resp. N(xq)). 



4. Examples and Simulations 

We have implemented the estimation method for four different processes (listed in Ex- 
amples 1-4 below) with the kernel described in ([5]) (with / = 2 and u the Gaussian density). 
The bandwidth set has been fixed to H = {^j, 1 < j < M} with M = [2(nA)- 1 / 3 J. For 
the implementation, a difficulty is the proper calibration of the constant c in ([8]). This is 
usually done by a large number of preliminary simulations. We have chosen c = 0.1 as 
the adequate value for a variety of models and number of observations. The estimation 
and adaptation are done for 50 points xq on the abscissa interval. For clarity, we have 
computed the Mean Integrated Square Error (MISE) of the estimators. Figures Q] and 
[2] plot ten estimated curves corresponding to our four examples with in the first column 
A = 0.02, n = 5.10 3 , and in the second A = 0.05, n = 5.10 4 . This values of parameters 
can be interpreted as around hourly observations during few years. 

Example 1. Let L t = *i> where (N t ) is a Poisson process with constant intensity 
A and (Yi) is a sequence of i.i.d random variables with density / independent of the process 
(Nt). Then, (Lt) is a Levy process with characteristic function 

(9) Mu) = exp (xt j^e iux - l)f(x)dx) . 

Its Levy density is N(x) = Xf(x) and thus g(x) = \xf(x). For our first example, we 
choose A = 2 and / such that g(x) = xf{x) = (l/2)- s /x/2 for < x < 2. Then assump- 
tion G4(l/2) holds (on (0,2)), but not G4(/3) for other j3. Since f3 is small, the rate of 
convergence is slow. The discontinuity in 2 damages the estimation as it can be seen in 
Figure CD 

Example 2. Let a > 0, 7 > 0. The Levy-Gamma process (Lt) with parameters (7, a) 
is such that, for all t > 0, Lt has Gamma distribution with parameters (jt,a), i.e the 



s 
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density: 



a 



r(7t) 



X 



x>0- 



The Levy density is N(x) = 'yx~ 1 e~ ax t x> o so that g{x) = ^e~ ax \ x> Q satisfies assumptions 
Gl, G2 and G3(p). Here we choose = 7= 1. This example allows to study the role 
of the discontinuity in 0, which invalidates assumptions G4-G5. We can observe that the 
estimation become very good if we move away from 0. 

Example 3. For our third example, we also choose a compound Poisson process, but 
with / the Gaussian density with variance S 2 . Thus g{x) = Xxf(x) = \xe~ x ^ 2S ) /(5s/2tt) 
and g*{u) = i\5ue~ s2u2 ^ 2 . Assumptions Gl, G2, G3(p),G5 hold for g. Moreover g belongs 
to a Holder class of regularity j3 for all /3 > 0. Thus the rate is close to (nA/ log(nA)) -1 , 
and the good performance of our estimator is visible on Figure [2j Note that is the so-called 
Merton model used for describing the log price in financial modeling. Here we choose A = 2 
and 5 = 0.3. 



Ex ample 4. Our last example is the Variance Gamma process, as described in lMadan et al 
( 19981 ). It is used for modeling the dynamics of the logarithm of stock prices. The process 
is obtained in evaluating a Brownian motion at a time given by a Levy-Gamma process. 
Denoting (Bt) a standard Brownian motion, and (Xf) a Levy-Gamma process with pa- 
rameters {1/v, independent of (B t ), we set L t = 9X t + oBx t - Then L t is a Levy 
process, with 



As in example 3, there is a discontinuity in 0. Here we choose 9 = —0.1436, a = 0.1213, 
v = 0.1686: these a re estimates of parameters for the S&P index option prices studied in 



For high frequency data, it is frequent that the sampling is irregular, i.e. the interval A is 
not necessarily the same at each time. In this section we consider the following framework. 
The observations are (Lt k ,k = 1, n) where (Lt) is still a Levy process with characteristic 
function ([1]). For each k > 1, we denote = tk — tk-i the sampling intervals. Notice 
that it includes the previous case when for each k, A^ = A. The increments are denoted 
by Zk = Lt k — Lt k _ 1 . In this context of irregular sampling, they are still independent but 
with non-identical distribution: has the same law than L/\ k . To define an estimator, 
we observe that E \Z^e mZk \ = Aki/jA k (u)g* (it), and then 





Madan et all (|l998l b 



5. Irregular sampling 




Thus, denoting A = £ £Li A fe , we introduce 



1 



n 



(10) g* h {u) = ^Y< Z ^ UZkK *( hu ^ 9h(x) 

n./\ L — ' 



nA 



^ZkK^x - Z k ) 



k=l 



k=l 
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Ex 1 (nA = 1000) MISE= 0.032 



Ex 1 (nA = 2500) MISE= 0.014 





0.2 0.4 0.6 0. 



0.2 0.4 0.6 0.8 



1.6 1.8 2 



Ex 2 (nA = 1000) MISE= 0.894 



Ex 2 (nA = 2500) MISE= 0.057 





Figure 1. Function g (solid line) and estimators g~ h (dotted lines). 



Additionally, for all real 5, we denote A s = - Ylk=i ^i- We can bound the Mean Squared 
Error of this estimate: 

Proposition 5.1. Under Gl, G2, G3(l), G4((3), G5 and if K satifies Kl and K2(a) with 
a > f3, we have 

2 



(11) 



A 2 



1 A 2 

MSE(x Q , h) < ah 2 ? + 02—^ + c' 2 — ^ + c'i I -=- 

nnA nn/\ z \ A 



withd =2(L/L/3J!/|K( U )|| W |/ 3 ^) 2 ;C / 1 = 2(2|| 5 '|| 00 ||< 7 || 1 ||K|| 1 ) 2 )C2 = ||( 5 *)'|| 1 ||K||2/(27r) ; 
c'2 = \\K\\l\\g\\l 

The proof is similar to the case of regular sampling, therefore it is omitted. 
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Ex 3 (nA = 1000) MISE= 0.009 Ex 3 (nA = 2500) MISE= 0.002 





Ex 4 (nA = 1000) MISE= 0.811 



Ex 4 (nA = 2500) MISE= 0.375 
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-0.5 -0.4 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 0.5 



Figure 2. Function g (solid line) and estimators (dotted lines). 



In this section, we are still interested in the high frequency context: the asymptotic 
framework is A — > and nA —> oo when n — > oo. We shall also assume that 



(12) 



(A 2 ) 2 
A 



0{n- 1 ). 



Condition f)12|) is verified for instance if A& = Ck a with a £ [1/3, 1]. Then we find the 
same rate of convergence replacing A by A: 

Proposition 5.2. Under the assumptions of Proposition 15.11 and under condition ilfy) . 
_ i 

the choice h op t oc ((nA) 2 f 1 + 1 ) minimizes the risk bound ill]) and gives M SE(xQ,h op t) = 
0((nA)"a!+i). 
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As already noticed in IComte and Genon-Catalot other estimation strategies 

than (I10p are possible. For each real 5, we obtain an estimator by setting 



1 " 

= "TO E A kZkK h (x - Z k ). 



2 2/3 

Under suitable conditions, this estimate has a MSE bounded by a constant times (nA <5+1 /A 25+1 ) . 

2 - 

But, for all 5, by the Schwarz inequality, A s+1 /A 25+1 < A. That is why we prefer esti- 
mator (fT0|) , 

To build an adaptive estimator, we use the same method of bandwidth selection. The 
set of bandwidth is still H = < j < M}. We also define 



1 n 



and we set as previously A(/i,x ) = sup h , &H [\gh,h'(xo) - gh'(xo)\ 2 - V(h')] + with 

nhA 

Then the estimator is g~ h {xo) with h = h(xo) S argmin/ lg //{A(/i, xq) + V(/i)}. 

We can state the following oracle inequality (the proof is very similar to the one of 
Theorem 13.21 and is therefore omitted) . 

Theorem 5.1. We use a kernel satisfying Kl and M = 0((nA) 1//3 ). Assume that g 
satisfies Gl, G2, G3(5) and take 

(is) c, = ^\\Kf{wy\\ l + u\\l) 

withc> 16max(l, HKlloo). Then, if(E?) 2 /K< I, 

E[\g{x Q ) - g k (x )\ 2 ] < C j mf {\\g - E\g h ]\\l + V(h)} + 

Moreover, if g satisfies G5, G4(f3) with (3 > 1 and the kernel satisfying Kl and K2(a) 
with a > (3, and M = [(nA) 1 / 3 ], A < n" 1 and (E?) 2 /K = 0(n _1 ) ; then 

2/3 



n\g(xo) - g h ( x o)\ ] = O (log(nA)/nA 



2/3+1 



2/3 



Thus the rate of convergence in this case of irregular sampling is (log(nA)/nA) 2 @+ 1 
provided that (A 2 ) 2 /A = 0{n- 1 ). 

6. Proofs 



Let u s first state two useful p ropositions (see Proposition 2 1 inlComte and Genon-Catalot 
(l2010bl ) and Proposition 2.1 in lComte and Genon-Catalotl |2009) for a proof). 

Proposition 6.1. Denote by Pa the distribution of and define y,&(dx) = A~ 1 xP / \(dx). 
If J R |x|iV(x) < oo, the distribution ^a has a density h& given by 



hA(x) = j g(x- y)P^{dy) = Eg(x - Zf). 



12 
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Proposition 6.2. Letp > 1 an integer such that J K \x\ p 1 \g(x)\dx < oo. Then E(|Z^| P ) < 
oo and E[(Z^) P ] = A J" R x p ~ 1 g(x)dx + o(A). Moreover, if g is integrable, E(|Z^|) < 
2A|ry|,. 

6.1. Proof of Lemma 13.11 First, we study 62(^0) using Proposition 16.11 



6 2 (* ) = ^E 



h 



Xq — U 

h 



g(u)du 



H IK 



Xq — U 
h 



E[g(u - Z?) - g{u)]du. 



Now, applying the mean value theorem to g, we get 



I fo(xo) I 



with uz x £ [u — , u 



< Hs'lUllKlKElzfl. 

From the results of Proposition 16.21 we obtain 

(14) \b 2 (x )\ < 2||«/|| 00 ||if|| 1 || 5 || 1 A. 

To study bi(xo) = Kh*g(xo) — g(xn), it is suffic ient to use Taylor's theorem and G4(/3) 
(this is a classic computation, see iTsybakov (120091 ) for details) and we obtain 



(15) 



|&i (xq) I < 



hPL 
II 



\K(v)\\vfdv. 



Gathering (|14p and (|15p completes the proof of Lemma 13.11 □ 



6.2. Proof of Lemma 13.21 As the Zu are i.i.d., we have: 



Var[^(xo)] = Var 
Thus, 

Writing 



1 n 



nhA 



fe=l 



Xq 



zt 



1 



n(/iA) : 



rVar 



X 



z£ 



Var[5(x )] < 



n(hA) 



-E 



(Zf) 2 K 2 ( X ° Z * 



K 



2 / Xq - Z 



we obtain with v = u/h 



Var[g(x )\ < 



1 



A 2 



E 



< 



1 



h 

{Zff 
r E 



1 r .(xq-z^)u 

— / K*(u)e~ l ^du 

2tt 



— I K*(vh)e- i{ - x °- z ^ v dv 
2vr J 

J J Zfe iZ ? v K*(vh)e- iX0V Zfe iZ ? u K*{uh)e- ix ° u dvdu 



nA 2 (27r) 2 

Using Fubini and E[(Zf fe^^"")] = -ip'^{v - u) we find 

1 



V&r[g(x )\ < 



nA 2 (27r) 



rp A "(v - u)K*(vh)K*(uh)\dvdu 
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Now the following formula 

ip A " = jAVaY + %N>±9*' = -A 2 Va5* 2 + i^ A g*'- 
gives Var[g(xo)] < T\ + T2 with 

T x = 2 // \A 2 ip A (v - u)(g*) 2 (v - u)K*(vh)K*(uh)\dvdu 

nA 2 (27r) J J 

1 

nA 2 72vr) 
We first bound T2: 



T 2 = n2 / / |A^a(u - u)(g*)'(v - u)K*(vh)K*(uh)\dvdu. 



1 



T2 < nA(27r) 2 l/ // I'-At^-'OlK.^nr- ■ u)\\K>ii<h)\ 2 dv<h, 



\Mv - u)\\(g*)'(v - u)\\K*(uh)\ 2 dvdu 



1 



^ / \K*(vh)\ 2 dv I \Mz)\\(g*)'{z)\dz 



1 



< g / \K*(u)\ 2 du \ \(g*)'(z)\dz, because |^a(*)| < 1 

n/iA(27r) 



I A' 



2 



where (5*)' exists and is integrable by G2. Following the same line for the study of T\, we 
get 

Ti<^|/ \{g*) 2 (z)\dz<^^, 
limn J nn 

This completes the proof of Lemma 13.21 □ 

6.3. Proof of the lower bound. Here we prove Theorem 13.11 The essence of the proof 
is to build two functions go and g\ which are far in term of pointwise distance but with 
close associated distribution. Let 

1 Ax 

9o(x) = xf x (x) = - TTW 

where fx is the density of the Cauchy distribution C(0, A) with scale parameter A. Here A 
is a positive and small enough real (it will be made precise later). Now let K a infinitely 
differentiable and even function such that J K = 0, K(0) 7^ and K(x) = \x\~ 2 for \x\ 
large enough (say for \x\ > B). Using this auxiliary function K, we can define 

9i(x) = go(x) + chP n K f - - X ° ^j x 

where c is a constant to be specified later and 

h n = (nA)"Wi. 
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We denote Nq(x) = go(x)/x and Ni(x) = g\(x)/x. Remark that if Lo,t = Yli=i^i ^ s a 
compound Poisson process with N t a Poisson process of intensity 1 and Yi Cauchy C(0, A) 
variables, then its characteristic function is 



^ t (u) = exp (t / (e lux - l)No(x)dx) 
and = Lo,fcA — -^o,(fc-i)A nas distribution P Q (dx) = e~ A 5o(dx) + ipo(x)dx with 



00 A fc 
^ (x) = ^e- A — f{\x). 

k=l 

Moreover N\ is a density. Indeed the definition of K guarantees that J N\(x)dx = 
J No(x)dx + chn J K ^ x = ^" ^ nc ^ ^° ensure the positivity of iVi, it is sufficient to 

prove that \N\ — Nq\ < Nq. But, if \x\ > \xo \ + Bh n , 

Nq 1 (x)\N 1 (x) - N (x)\ < Cch^ +2 x 2 \x - x \- 2 < 1 

for c small enough, and if \x\ < \xq\ + Bh n , 

N^ix^N^x) - N (x)\ < Cchi{\ + (A(|x | + P/i n )) 2 )||if ||oo < 1 

for c small enough. Then, if Lit = ^ with Nt a Poisson process of intensity 1 and Yi 
random variables with density Ni, it is a Levy process with Levy measure N\{x)dx. We 
denote f/'i.A the characteristic function of L^a with distribution Pi, and ip\ the function 
such that P\{dx) = e~ A 5o(dx) + (pi(x)dx. 

Now let us denote for two probability measures P and Q, X 2 (P, Q) = I (dP/dQ - l) 2 dQ. 
In the sequel we show that 

1) 9o,9i belong to H{/3,L), 

2) |si(so) -flt>0=o)| > C(nAy^T-\ 

3) x 2 (P{ l ,PJ 1 ) < C < 00 where P™ (resp. PJ 1 ) is the distribution of a sample 
Z^,...,Z^ s.t the associated Levy process L (resp. Li) has Levy measure 
No(x)dx (resp. Ni(x)dx). 

Then it is sufficient to use Theorem 2.2 (see also p.80) in lTsvbakovl (|2009l ) to obtain The- 



orem [37TJ In the following we denote all constants by C, even if it changes from line to line. 
Proof of 1). Belonging to the Holder space 

To prove that our hypotheses belong to H(f3, L), it is sufficient to show that, for i = 0,1, 
\\9i, k+1 ^ lip — L where k = |_/3J and p _1 = 1 + k — f3. Indeed Holder inequality gives 



gf\x)-gf\y)\ 



9i k+1 \v)^[x, y ]{v)dv 



< \\9i k+1) \\p\x-yf~ k for all x,y. 



When x goes to infinity, gy k+1 \x) = C\ l x k 2 + o{x k 2 ) so it belongs to L p since 

p(k + 2) = {k + 2)/(k + 1 — /3) > 1. Choosing A small enough ensures Hfl^"^!!?) — -^/^ < L. 
Now to study g±, we can write 



(91 ~ 9o) (k+1) (x) = cxK^ ht*- 1 + <k + 1)K^ (?-J. 



x \ u/3-k 

"n 
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Let us see if this two terms are in L p . Writing x = x — xq + xq and changing variables 
J xK^ (^^) P dx<2 p ' 1 h p n +1 j \vK^ k+1 \v)\ p dv + 2P- 1 \x \ p h n J \K^ k+1 \v)\ p dv. 

These integrals are finite since v K( k+1 \v) = v - (2+k ^ for v large enough and p(k + 2) = 
(A; + 2)/(k + 1 - f3) > 1. In the same way 

J K (k) (^^j P dx < h n j \K {k \v)\ p dv. 

Thus 

||(5i - So) (fc+1) ll£ < C(?{h n hP n ^- k -V + hnhtfP-Q) < Cfpfjrfi/p+p-k-i) < Cc p < ( L / 2 )p 
for suitable c. Then g\ — go belongs to H(j3,L/2) and g\ belongs to ~H(f3,L). 
Proof of 2). Rate 

By assumption, xq / and we can see that \gi(xo)— go(xo)\ = chn\K(0)xo\ with K(0) / 0. 

_ i _ g 

Since h n = (nA) 2 " +1 , this quantity has the announced order of the rate: (nA) 2 ' 3 + 1 . 

Proof of 3). Chi-square divergence 

Since the observations are i.i.d., x 2 (Pi,Po) = (1 + X 2 (Pi, Po)) n — 1- Thus, it is sufficient 
to prove that x 2 (Pi>Po) = 0(n _1 ) where 

2 



Jx^O \<Po( x ) J 



Indeed Pi({0}) = e _A = Pq({0}). Now let us remark that for n large enough 

^(x) = ^ e ~ A — (x) > e- A Af x (x) > Ae^A^VU + (Ax) 2 ) 
fc=i 

since A is bounded. Then ipo(x) > C~ 1 Ax~ 2 for \x\ large enough, say |x| > A and </?o0e) > 
C -1 A for |x| < A. Next we write x 2 (Pi, Po) = J^o (^i( x ) ~ ^o(^)) 2 {y>o{ x ))~ ldx = h+I 2 
where I\ is the integral for \x\ < A and I2 for \x\ > A. We will bound these two terms 
separately. 

Since tpo(x) > C _1 A for |x| small 

h = {(pi(x) ~ ¥o{x)) 2 {ipa{x)y 1 dx < CA^ 1 / (ipi(x) - ip (x)) 2 dx. 

J\x\<A J\x\<A 

For i = 0, 1, the Fourier tranform of ipi is ipi % &.{u) ~~ -P«({0})- Thus Parseval equality gives 

h < OA' 1 J |Vi,a(u) - Vo,A(n)| 2 du. 
In order to get a bound on IV^A — V'o.aIi we apply the mean value theorem: 
|^i («) - ^o(«)| < sup |e z ||A f (e iux - l)(iVi(x) - N (x))dx\ 
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where I u is the segment in C between a u = A J (e mx — l)No(x)dx and b u = A J (e lux - 
l)Ni{x)dx. But 

J (e iux - l)(iVi(x) - N Q (x))dx = chi J (e iux - 1)K (^^) dx = ch^ +1 e iux °K* (h n u) . 

Note that this quantity is well defined since K belongs to L 1 . Thus 
|^i («) - Vo(n)| < (sup, e/u e^)Ach P n +1 \K*{h n u)\ 

where JH(x) means the real part of x. We can compute 9\(a u ) = a u = A(Nq(u) — 1) = 
A(exp(-|n/A|) - 1) < and 

9\(b u ) = SH(A(iV *(u) - 1 + (JVi - N )*(u))) = A(iV » - 1 + ch^ +1 D\(K*(h n u)e mx °)). 

Since K is even, 

JR(6 U ) = A(exp(-|n/A|) - 1 + ch^ +1 K*(h n u) cos(«x )) < cAh^WK*^ < C 
so that 

(16) |^i(«)-^o(«)| < e c Ach^ +1 \K*(h n u)\. 
Then 

(17) h < CA- 1 J \AhP +1 K*(h n u)\ 2 ' du < CAhf +1 . 

Let us now bound the term I2, using that ipo(x) > C~ 1 Ax~ 2 for \x\ large enough 



2 

, dx < CA _i / — v9o(^)) z 

'I^I^A W^J J 

But F = (pi — (po has Fourier transform 

F* = Vi,a - V>o,A = exp(A(e- |M/A| + c^ +1 ir*(h n n)e iua;o - 1)) - exp(A( e -l"/ A l - 1)) 
and this function is differentiable everywhere exept at u = 0, with derivative 

F*' = A7i^i,a - A7o'0o,A 

where 

7o (n) = -sign(n).e-l u / A l/A, 7l (u) = 7o (it) + ^ +1 e mi »( Ko i^(/i„«) + KK^M). 

Let us now prove that the Fourier transform of i 7 *' is —2irixF(—x). Let us write the 
factorization 

(18) A _1 F*' = 7i^i,a - 7o^o,A = (71 - 7o)V>i,A + 7o(V>i,A - V>o,a) 

with IV'i.aI < 1- Since K* and if*' are uniformly bounded, 71 — 70 is bounded as well. 
In the same way, the inequality (fT6|) entails that — ^o,a||oo < 00, so that F*' is 

bounded. Thus F* is Lipschitz and absolutely continuous. Moreover, using again (fT8j) . we 
can see that F*' is integrable (we can choose K such that K* is integrable, for example 
take for K the difference between the Cauchy density and the normal density). Then, 
according to iRudinl (jl987n . the Fourier transform of F*' is —ixF**{x) (it is in fact a simple 
integration by parts). Since F* is integrable, F**(x) = 2irF(—x) almost everywhere, and 
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we have proved that (F*')*(x) = —2irixF(—x) a.e.. Next, the Parseval equality provides 
/ \xF(x)\ 2 dx = (27T)- 1 / \F*'{u)\ 2 du. Thus 

h < CA" 1 / \xF(x)\ 2 dx < CA(2tt)- 1 / | 7i ^i,a - 7oV>o,a| 2 - 

Hence, using the factorization (|18p we can split I2 < 7r _1 CA(/2.i + ^2,2) with 

h,\ = Jhi- 7o| 2 , 
^2,2 = / |7oOi,a - -0o,a)| 2 - 

Using the definition of 71, we compute 

h,i = c 2 hf +2 J \ix K*{h n u) + h n K*'{h n u)\ 2 du 

< 2c 2 hf +l (x 2 J \K*\ 2 + h 2 n J \K*'\ 2 ^j 

Now, in order to deal with fyp, we use the previous bound (fl~6l) on |V"i,A — V'o.aI 
h,2 < Cc 2 A 2 hf+ 2 J \ l0 (u)K*(h n u)\ 2 du 

(20) < Cc 2 A 2 hf +2 ||K*||oo||7o||2 < Chf +l 

since A is bounded. 

Finally, by gathering (|I7]> . (fl9l) and (BD]> . we get 

X 2 (^i^o) < CA/if + 1 = O^" 1 ). 
This ends the proof of Theorem 13.11 □ 

6.4. Proof of Theorem 13.21 The goal is to bound E[| (7(2:0) — gf,{ x o)\ 2 }- To do this, we 
fix h € H. We write 

Iff^o) - 9h (»o) I < li^Oo) - ffh.ft^o)! + Iff^hC^o) - 9h(xo)\ + \gh(xo) - g(x )\. 
So we have 

\g(xo) ~ 9fS x o)\ 2 ^ 3\9h( x o) ~ 9 h ; h (.x )\ 2 + 3\g h ; h {xo) ~ gh{x )\ 2 + 3|&(sg ) - 9(^o)| 2 - 

Define S := 1 3^(2:0) - ^(^o)! 2 and C := Iff^fco) - gh{x$)\ 2 . 

We have A(h) > \g h (x ) - g h h (x )\ 2 - V(h) >B- V{h). So B < A(h) + V(h). 

Moreover, A(h) > \g h ; h (x ) - gh{x )\ 2 - V(h) >C- V{h). So C < A{h) + V{h). 
Therefore, 

\g(xo) - g h (x )\ 2 < 3(A(h) + V{h)) + 3(A(h) + V(h)) + 3\g h (x ) - g(x )\ 2 . 
Now, by definition of h, AQi) + VQi) < A(h) + V(h). This allows us to write 
\g(x ) - g k (x )\ 2 < 6A(h) + 6V(h) + 3\g h (x ) - g(x )\ 2 . 
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Let us denote b h (x ) = E[g h (x Q )] - g(x ) and 6^,2(^0) = ^[9h(x )] - K h -kg(x ) (these are 
the same notation as in Lemma l3.1|, but with subscript h). Thus 

E[\g(x ) - g h (x )\ 2 ] < 6E[A(h)] + 6V(h) + 3b 2 h (x ) + 3Vax(g h (xo)) 

< 6E[A(h)] + 3b 2 h (x ) + C 2 V(h). 

It remains to bound EL4(/i)]. Let us denote by gh,h> = E[gh,h>] and gh = E[g/j]. We write 

(21) Slh,h' - 9h> = 9h,h' - 9h,h' - 9h! + 9h! + 9h,h' ~ 9h> , 

and we study the last term of the above decomposition. We have 

\9h,h'(x ) - g h '(x )\ = \E[g h>h ,(x ) - 9h'{xo)]\ 

= \M.[K h >*g h (x )-gh'(x )]\ 

= \K h > *E\g h (x ) - g(x )] +K h >*g(x ) - E[g h ,(x )]\. 

This can be written: 

\9h,h'(x ) - g h '(x )\ = \K h ,*b h (x ) + b hi2 (x )\ 

K I — 1 />,,(» +\b h , 2 (x )\. 



< 

so that 



b! 



Now |V 2 (x )| < \b h (x )\ < 

\9h,h>(xo)-9h>(x )\ 2 < 2|"- "' 

(22) 

Then by inserting (|22p in decomposition (|2ip . we find: 
A(/i) = sup{|5fc ft /(x ) -^'(^o)! 2 - V(h!)} + 



h! 



K(v)\dv) +2|6 M (x )| : 



< 2(pr||? + l)||& fc | 



(23) 



< 3sup{|5 h A /(x ) -5ft,/i'( x o)| 2 - ^(/i')/ 6 }+ 

ft.' 

+3sup{|^(x ) - g h >{x )\ 2 - V(h')/6} + + 6(\\K\\j + l)\\b h 
h' 



|2 

loo ■ 



We can prove the following concentration result: 

Proposition 6.3. Assume that g satisfies Gl, G2, G3(5) , K satisfies Kl, M = O^nA) 1 / 3 ) 
and take c in |2|) such that c > 16max(l, ||X||oo). Then 

log(nA)' 



E 



sup{\g h '(x ) - g h '(x )\ -V(h')/6}_ 

. h' 



sup{\g h , h >(x ) - g h ,h'(xo)\ 2 - V(h')/6} 

. h' 



-- o 

o 



nA 
log(nA) 
nA 



(24) 

(25) E 

Inequalities ([23]) et ([25]) together with ([23]) imply 

E[\g(x ) - g- h (x )\ 2 ] < CJML + C 2 V(h) + C 3 ^ 
This completes the proof of Theorem 13.21 □ 



log(nA) 
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6.5. Proof of Theorem 13.31 In all this proof, we shall use the following notation: 

1 n i n 

6 A (u) = i £ Z£e^, fjA(u) = - J>£) V^, 
n ^-^ n 

k=l fc=l 

and 6&{u) = K0&(u), i]a(u) = Ki]/\(u). We also denote f(x) = xg(x), so that f*(u) = 
i(g*)'(u) is estimated by = r)/±{u)K*(uhi). Now, let 

^ = " & J2 < Ib*ll2(l - 1/V2) and II/* " A*Jl < II/* ||i/2}. 

The proof is decomposed in three steps. First we shall prove that the inequality is true 
on 0, i.e. 

E[\g(x Q ) - 9 h (xo)\ 2 tn] < C | mf {\\g - E[g h }\\l + E(V(/i))} + 

The second step is to show the rough upper bound 

E[| 5 (x )-^(x )| 4 ]<C(nA) 2 / 3 . 
Finally we will show that P(f2°) < C(nA)~ 8 / 3 . Consequently 

E[\g(x ) - g h (x )\H n c) < ^E[\g(x ) - ^(x )| 4 ]P(ft c ) < C(nA) 
and the theorem is proved. 
• First step: 

Following the proof of Theorem 13.21 we can obtain 

E [\g(xo) - g h (x )\ 2 tn} < 6E[A(h)l n } + 3^(x ) + C 2 E(V(h)) 
Using the definition of A(h), it is then sufficient to prove 



-i 



(26) 



(27) 



E 



sup{\g h/ (x ) - g h '(x )\ 2 - V(h')/6} + ln 

. h' 



E 



snp{\g h ,h'(xo) - gh,h<{xo)\ 2 - V(h')/6} 1q 
h' 



O 
■ O 



log(nA) 

nA 
log(nA) 

nA 



to obtain the result. Now, let us remark that on £1 



\h*\\l<m 2 \\l and \\\r\\i<\\fX\ 



with II/*!!! = ||( 5 *)'||i, so that 



Ztt 



(9*y\\i + \\g*\\D 



Then, using Proposition 16.31 since c/2 > 16max(l, \\K\ 



E 



sup{\g h >(x ) - gh'(xo)\ 2 - V(h')/6},ta 



< E 
= O 



rr f \ ( m2 ^ II fii2 f\\ i , II ~*n2\ l °s(nA) 

sup{ 5 fc /(x )-fffc'(x ) -- — \\K\\ (\\(g*) ||i + \\g*\\ 2 ) — } 

hi b 2ir nA 

log(nA) 

nA 
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and we prove (I27h in the same way. 
• Second step: 

First, using Lemma 3.1, \gtXxo) — g(xo)\ 2 < sup h£H (c\h 2 + c^A 2 ) < C. Then the bias 
term is uniformly bounded. Let us now study the variance term. We can write 

9h(xo) = ^ / e- iX0U K*(uh)^e A ( U )du 



and, since all h £ H is larger than 1/M, 

\k(. x o) -9d x o)\ < 7T SU P / \ K *( uh )\ 



0a(u) - 6 A {u) 



A 



du 



< 



M 
2^ 



■£/V< 



U)\ 



e A (u/h) - e A {u/h) 



A 



du. 



With a convex inequality 



|^(x )-^(xo)| 4 < ^£ (/>>)! 



(u/n) - e A («//l) 



A 



Next, we use the following inequality (obtained with two uses of the Schwarz inequality): 



E 



< 



(j){u)du) 



E [4>(ui) . . . <fi{un)\ dui . . . du4 




E 1 / 4 [<K«i) 4 ] . . . E 1 / 4 [(f>(u 4 ) 4 ] du l ...du A =( / E 1 / 4 [0(u) 4 ] du 



Thus, 



E[|^(x )-^M| 4 ] < ^jE f/l^»|E 1/4 



A (u//l) " («//») 



A 



But, according to Proposition 2.3 in lComte and Genon- CatalotJ (120091 ). under G3(2p), for 

2p 



> 1, A- 2 PE 



A {v) - 9 A (v) < C{nA)-P. Hence, under G3(4), 



ng h (x ) - g h (x )\ 4 < CM 7 Y,( [^(uMnAr^du 

heH ^ 



< C\\K*\\fM s (nA)- 2 < C\\K*\\j(nA) 2 / 3 . 



Third step: 



m c ) = p(n 5 *-^ 2 ii2>ii3*ii2(i-i/v2)or nr-/,* 1 iii>iiriii/2) 

< (|| 5 *|| 2 (1 - l/V2)r s E\\g* h2 - + ([|r||i/2)- 16 E[|/^ - f \\{ 6 



< c (eu& 2 - g * h2 \\i + n\gi 2 - g* hi + nr hl - r hl iir + nr hl - / 

Thus we have four terms to upperbound. 



f *l|16 
ll 
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First term: Since g* h (u) = KQ(uh 2 )6 A (u) / 'A, 



21 



^2 ~~ 9h 2 \\2 



1 



H 2 I 



A (u/h 2 )-6 A (u/h 2 



A 



du. 



Then, under (33(8), 

* a f/ El/1 



n* II 8 



KM 



o A {u/h 2 ) - e A ( u /h 2 



A 



< 



-8/3 



1 (| l^^l^nA)- 1 ^) < ||^ *||«M 4 (nA)- 4 < 16(nA)" . 

Second term: Since g^ 2 = KQ(uh 2 )g* {u)ip A {u) , we can decompose the bias into 
g*{u) - g* h2 (u) = g*(u)(l - K* (uh 2 )) + g* (u)K* (uh 2 )(l - 4> A (u)) =h + b 2 
Using that J \g*(u)\ 2 u 2 du < oo, 



\g*(u)(l-K*(uh 2 ))\ 2 du = I \g*(u)\ 2 l ]uh2>l du 



< / \g*(u)\ 2 \uh 2 \ 2 du < Chi 



\\bl 



O n the other hand, using that |1 — tf) A (u)\ < \u\A\\g\\i (see Proposition 2.3 



m 



Comte and Genon-Catalot (2009)) 



Ilk 



J \g*{u)K* Q {uh 2 )(l-^ A (u))\ 2 du<CA 2 J \g*(u)u\ 2 du 



< CA 2 < CinA)- 1 . 
Thus, taking h 2 = (nA)" 1 ^ gives y* _ g *js < Ch 8 + c(nA)- 4 < C(nA)- 8 / 3 . 

Third term: Since fh 1 ( u ) = Ko(uh±)f)A(u) / A, 



1 1 /iii ~~ /iii I 



< 



\K* {u)\ 



A 



du 



Next, we use the following inequality 



E 



( / 4>(u)du) 



16 



< ( / E 1 / 16 [0(n) 16 ] du 



16 



Exactly as in lComte and Genon-Catalot (2009), using the Rosenthal inequality, we 
can prove under G3(4p), for p > 1, A- 2 PE\f) A (v) - r] A (v)\ 2p < C(nA)~ p . Then, 
under G3(32), 

16 



nr hl -r h Xi < 



i 



E l/16 



1 16 



fj A (u/hi) - r) A (u/h{) 



A 



16 









-8/3 
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since hi = (nA) -1 / 3 . 

Fourth term: Since t/a = —^" A = ^f*ip A + A 2 (g*) 2 tp A , we can decompose the bias 
into 

f»-A*» = nu)-K* ( U hi)r( U ^ A (u)-AK* (uhi)(g*(u)) 2 ^ A (u) 
= f* (u)(l - K* Q (uhi)) + /* (u)K* (uhi)(l - i(> A (u)) 

-AK* (uhi)(g*( U )) 2 4, A (u) 
= 61 + 62 + 63 
Since f \ f*(u)\ 2 u 2 du < 00, 



1/2 



||6i||i < / \rtu){l-K* Q {uh x ))\du= / |/*(u)|l Kl |>i^ 

\ 1/2 



< 



J \t{u)\ 2 \uhi\ 2 du J \uh 1 \-H\ uhl>1 du\ < Ch\ 



On the other hand, using that |1 — ^ A {u)\ < \u\A\\g\\i 
II62II1 < / |f (u)K* (uhi)(l - i/> A (u))\du < CA [ \r(u)uK* (uhi)\du 



\ 1/2 

< CA[ I \ f{u)u\ 2 du I \Kl{uhi)\ 2 du\ 



< CA/i~ 1/2 < CihinA)- 1 / 2 



and 



||6 3 ||i < A J \K* (uhi)(g*(u)) 2 ^ A (u)\du 

< A J \(g*(u)) 2 \du < CA < C(nA)-^ 2 

Thus ||/* - f^Wf < Ch\ + C{hinA)~ 8 + C(nA)- 8 < C(nA)- 8 / 3 . 
This completes the proof of Theorem 13.31 □ 



6.6. Proof of Proposition 16.31 Note that 

(28) gh'(xo)-9h'(xo) = -^2 ~^~ K V ( x ° ~ ZkA ) _E ( ~\~ Kh ' ( x ° ~ ZfcA ) ) 

n k=i L V / 

In order to apply a Bernstein inequality, since the Z^s are not bounded, we truncate 
these variables and consider the following decomposition: 

{\Z k A \ < fi n } and {|Z fc A | > fJ, n } 

where 

ll#ll!(ll(<f) / lli + \\g*\\l) 



(29) [l n = H n {ti) 



2^\\K\\ 00 ^/V{h')/Q 
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We then decompose (|28|) as follows 

n 

9h<x ) - g h/ (x ) = -J2W k (ti) + T k (ti)-E(W k (ti) + T k (ti)) 

fc=i 

= S n (W(h')) + S n (T(h')) 
where S n (X) means (1/n) E?=i[^M - E P^)] and 
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(30) 

(31) 
Thus 



W fc (/i) 
T k (h) 



-K h (x - Z k A ) l { | Zfc A,< ( 



7 A 



E 



sup{|5fe'(^o) -fi'h'^o)! 2 - ^(^)/ 6 }- 



< 2 ^ E [S„(W(>0) 2 - V(/0/12] + + 2 E [SnW)) 2 ] • 
Then we use the two following lemmas 

Lemma 6.1. Assume that g satisfies Gl, G2, K satisfies Kl, andc> 16, M = 0((nA) 1//3 ). 
T/ien i/iere exists C > on/y depending on K and g such that 

( log(nA) 



£e[s£(W(/i))- TW12] + <C- 



nA 



Lemma 6.2. Under assumptions Kl, G3(5) and if M = ©((nA) 1 / 3 ), 

heH 



Lemmas 16.11 and 16.21 yield 
E 



snp{\g h '(x ) - g h/ (x )\ -V(h')/6}_ 
h' 



< c" [ — + log ( nA ) 

nA nA 



Inegality (|25[) is obtained by following the same lines as for inequality (|24p with 
replaced by * K^. This ends the proof of Proposition 16.31 □ 



6.7. Proof of lemma [6.11 First, note that 



E [Sl{W{h)) - V{h)/\2] 



< 



< 



P(Sl(W(h)) > V(h)/12 + x)dx 



V(h)F \S n (W{h))\ > y/V(h)(l/12 + y)) dy. 



Next, we recall the classical Bernstein inequality (see e.g. iBirge and Massartl ( 19981 ) for a 
proof): 
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Lemma 6.3. Let W%, ...,W n n independent and identically distributed random variables 
and S n (W) = (1/n) £™ =1 [W^ - E(Wi)]. Then, for V > 0, 

mS n (W)\ > rj) < 2exp ( { 2 ) < 2 max fexp (^f) ,exp 



U 2 + ^ j - — ^ ^ 4l/2 y > ~* ^ Ab 

where Var(W\) < v 2 and \W\\ < b. 



We apply this form of Bernstein inequality to Wi(h) defined by ([50]) and r) = (1/12 + y)V(h). 
Using Lemma 13.21 and A < 1, it is easy to see that 

vtaw) < s : = IMflfaM^E) and m < „ ;= wu-w 

2nAh Ah 

We find 

nr? 2 \ / 7r(l/12)V(/i)nAfc \ / TryV(h)nAh 



,,XIM ^ 2 ; exp V 2||ic|ii(|iaryi| 1 + 11^113); xexp V 2||^r|liC1IO*)'lli + 11^*111 

(nA)" c / 48 x (nA)-^ 4 



and 



exp(^) < (nA)- c / 48 x (nA)- c V^. 



Then we deduce 



E[5 2 (W(/i)) -F(/i)/12] + < ^°°y(/i)(nA)- c / 48 max((nA)- CJ/ / 4 ,(nA)- c ^ / ^ i ) 

(f'CXJ f'OO 
J {nA)- cy ' i dy + J (nA)- c Vy/ W2 di 

< ^ W („A)-o/«( i _i^ + - 1 -^ ? ) 

using that J °° e~ y / A = A and J °° e~v / ^/ A = 2A 2 . Replacing V(/i) by its value, it gives 

E E - V(*)/12] + < ^(nA)-W- (, + ^) E I 

Recall that H = {jfr, 1 < k < M}. Then 



E 7T = E T " lo §( M ) M ^ J log(nA)(nA) 1 /3. 



h ^ k 

h k=l 



Finally 



^[S 2 n (W(h))-V(h)/12} + < ^(nA)- 2 / 3 -/ 48 (log(nA) + ^) 
heH ° ^ C ' 



4C Q , _ aw1 (^__,_ k ^ , 96 



< — -(nA)' 1 log(nA) + 
as soon as c > 16. This completes the proof of lemma UTTl □ 
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6.8. Proof of lemma 16.21 For a fixed bandwidth h in H, we can establish the following 
bound: 



E[\S n (T(h))f 



Var 



1^^ 



n 



fc=l 



Ah 



x 



1 



{|^I>M 



< 



< 



1 \\K 



1 ||K||^E[|Zfr +2 /A] 



nA /i 2 /i r 

for any w > 0. Recall that, according to Proposition 16.21 E[|Z^|"' +i: /A] is bounded under 

4h 1 



G3(w + 2). We search conditions for ^2 h h 2 fi n w < constant. The following equalities 



hold up to constants: 

^ h 2 Hn w ~ ^ 



heH r " h 
Since /i = k/M, this provides 

h 2 + w / 2 ~ ^\k) 

k=l 



V{h) w l 2 _ \og{nA) w / 2 ^ 1 

~ (nA)W2 2_> ^2+^/2 • 



/l 2 



v 1 , =t(—) = m 2+w / 2 y — 



fc=l 



£2+uj/2 



0(M 



2+w/2\ 



Finally, as M = 0((nA) 1 /3) ) 

we have 

< C 



s-^ 1 ^ ^M 2+u, / 2 log(nA) w / 2 



A) w l 2 



< Clog(nA) w/2 {nA) 



iK 2 +f)-f 



We need that (2 + w/2) x 1/3 — u>/2 < 0, so we need the Zi admit a moment of order 
w + 2 > 5. This completes the proof of lemma HP1 □ 
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